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Abstract —Active subspaces are an emerging set of tools for 
identifying and exploiting the most important directions in the 
space of a computer simulation’s input parameters; these direc¬ 
tions depend on the simulation’s quantity of interest, which we 
treat as a function from inputs to outputs. To identify a function’s 
active subspace, one must compute the eigenpairs of a matrix 
derived from the function’s gradient, which presents challenges 
when the gradient is not available as a subroutine. We numerically 
study two methods for estimating the necessary eigenpairs using 
only linear measurements of the function’s gradient. In practice, 
these measurements can be estimated by finite differences using 
only two function evaluations, regardless of the dimension of the 
function’s input space. 

I. Active subspaces 

Modern physics and engineering simulations take several 
inputs—e.g., boundary conditions, material properties, and 
forcings—and output several quantities of interests. The sci¬ 
entist uses these simulations to study the relationship between 
inputs and outputs. Uncertainty quantification seeks precise 
characterization of the simulation’s quantities of interest sub¬ 
ject to variability in the inputs. These characterizations often 
reduce to parameter studies—such as optimization, integration, 
or response surface modeling—that treat the simulation as a 
mapping between inputs x and a quantity of interest /(x). 
However, thorough parameter studies quickly become infea¬ 
sible as the dimension of x grows, particularly if evaluating 
/(x) (i.e., running the physical simulation) is computationally 
expensive. To combat this curse of dimensionality, one may 
seek a low-dimensional parameterization of /(x) that (i) main¬ 
tains the input/output representation and (ii) enables otherwise 
infeasible parameter studies. One idea is to identify the least 
important input parameters and fix them at nominal values, 
thus reducing the dimension of the parameter study. Such 
identification is the domain of sensitivity analysis, and several 
techniques exist that use a few simulation runs to screen the 
inputs’ importance—such as local perturbations, elementary 
effects, or sensitivity indices Q. A more general approach 
is to identify important linear combinations of the inputs x 
and focus parameter studies along the associated directions. 
Active subspaces are defined by important directions in the 
high-dimensional space of inputs; once identified, the scientist 
can exploit the active subspace to enable otherwise infeasible 
parameter studies for expensive simulations Q. 

Assume that x € K™ is a vector of simulation inputs, 
and let the input space be equipped with a probability density 
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function p(x) that is strictly positive in the domain of / and 
zero outside the domain. In practice, p identifies the set of 
inputs of interest and quantifies the variability. We assume 
that the independent inputs have been shifted and scaled to be 
centered at the origin and have equal variances. Assume that 
/ : M"* —K is continuous, square-integrable with respect to p, 
and differentiable with gradient vector V/ € also assume 
that /’s gradient is square-integrable with respect to p. The 
active subspace is defined by the first n < m eigenvectors of 
the following mx m symmetric positive semi-definite matrix, 

C = y VfVfpdx = WAW'^, (1) 

where the non-negative eigenvalues are ordered in descending 
order. The eigenvalue Xi measures the average change in / 
subject to perturbations in x along the corresponding eigen¬ 
vector Wi, 

Xi = J(y pdx. (2) 

Eor example, if Xi = 0, then / is constant along the direction 
Wi. If / is constant along a direction, then one can ignore this 
direction when studying the behavior of / under changes in 
X. Suppose that the first n < m eigenvalues are much larger 
than the trailing m — n, and let Wi be the first n columns 
of the orthogonal eigenvector matrix W. Then a reasonable 
approximation for / is 

/(x) « g{W^x), (3) 

where p is a properly constructed map from K" to K. In Q, 
we study a particular construction of g, its approximation 
properties, and how those properties change when Wi is 
estimated. 

There are several recent works that study models similar 
to © in uncertainty quantification 0^ 0^ computational 
engineering |j^-0, and approximation theory pO) , fTT) . In 
statistics, subspace-based dimension reduction has wide use in 
regression modeling under the monikers sufficient dimension 
reduction 0 and efficient dimension reduction 0- Many in 
that community have recognized the connection between the 
dimension reduction space and the matrix C in Q; see 0^ 
0- However, in contrast to our case where the gradient 
V/ is given, the gradient of the regression function with 
respect to predictors must be learned. Additionally, there is no 
random noise in computer simulations in contrast to regression 
modeling. 




The tremendous potential benefits of dimension reduction 
drive us to pursue methods to estimate the eigenvalues A and 
eigenvectors W from ([T]). In | [T6) , we analyze the following 
Monte Carlo method. First, draw M samples {x^} indepen¬ 
dently according to p. For each x^, compute V/i = V/(xi). 
Then compute 

-I At _ 

~ (4) 

i=l 

In p6) , we use tools from nonasymptotic random matrix 
theory to study the approximation error in the estimated 
eigenvalues and subspaces 03, )T8). The approach assumes 
that one has access to the gradient V/(x) as a black box, 
which is not unreasonable in modern simulation codes due to 
adjoint methods or algorithmic differentiation 03- However, 
many legacy simulation codes lack gradient routines, and one 
must estimate the eigenpairs with only evaluations of /(x). 

IT Estimating active subspaces without 

GRADIENTS 

When the gradient is not available as a black box, one might 
estimate the gradient through evaluations of /, e.g., with 
finite differences or related techniques p0| . A first-order finite 
difference approximation to the gradient vector requires m + 1 
evaluations of /, so estimating C in Q takes M (to- 1-1) eval¬ 
uations, which may be prohibitively expensive for large-scale 
simulations. To surmount this challenge, we take advantage of 
a finite difference approximation of directional derivatives. For 
sufficiently small h > 0 and a G K"*, 

V/(x)^a « (/(x-h ft.a) -/(x))//i. (5) 

The left side of Q can be interpreted as a linear measurement 
of the gradient vector, and the right side takes only two 
evaluations of /—regardless of the dimension to. We study 
two techniques that exploit ([^l to estimate Wi from Q using 
only evaluations of /. The relationship Q is also exploited 
in the works 03, 0 to reconstruct functions of the form 
/(x) = g{Ax.), where A G Note the subtle difference 

between this reconstruction and the approximation model in 
•ID- Our goal in this paper is to estimate Wi, not propose 
a choice for g in (0. Another important difference is in the 
linear measurement operator; GD use the same operator 
for each V/i in Q, while we use an independent measurement 
operator for each V/^. 

A. Eigenvectors from projections 

Our first approach is based on work by Qi and Hughes pT| for 
estimating principal components from linear measurements of 
a collection of vectors. Suppose that G M’”, i = 1,..., M, 
are independent, zero-mean random vectors of the form 

d 

Zj = '^WijajVj, Wy ^ A/'(0,1), (6) 

where the Vj are orthonormal vectors. Let g k < 

TO, have independent standard Gaussian entries, and define the 
measurements m.i — Efzi. Then the orthogonal projection 
of Zi onto the column space of Ei is 

r^z, := E.iEfE,)-^m, = E,{eJE f)-^ eJZ i. (7) 


Qi and Hughes |21 


vectors of the matrix 


Theorem 2] show that the first d eigen- 


^^V,z,{V.z,f ( 8 ) 

i=l 

converge to the vectors vi,..., as M goes to infinity. The 
eigenvalues converge to quantities related to ai from (|3- In 
fact, their results show how to estimate the mean and principal 
components even when the model (|3 has a nonzero mean and 
random noise. 


There is no reason to suspect that the gradient vectors 
V/, from 0 satisfy the model 0. Nonetheless, we can 
numerically check if the eigenvectors of the matrix 


1 


(9) 


2 = 1 


where Vi is defined as in Q, are close to the eigenvectors of 
0 for chosen test problems. If so, we can exploit the finite 
difference relationship 0 to efficiently estimate each element 
of the fc-vector EfS/fi, which is the analogue of m.i in 0. In 
this case, we can estimate the eigenvectors W from 0 using 
M{k -1-1) < M{m + 1) evaluations of /. Note that, by the 
analysis in 121 , we do not expect the eigenvalues of C-p to 
converge to those of C. 


B. Low-rank approximation from linear measurements 

Our second approach is based on low-rank approximation 
of the matrix of gradients G = [V/i,..., V/m] G 
using linear measurements of the gradients. Define the linear 
measurement operator A4{-) as 

MiG) := [EfWh ••• E^WfM], (10) 

where Ei G M"*xfe independent standard Gaussian entries 
as in the previous section. Let r be the rank of the low-rank 
approximation. We seek matrices A G K^xr ^ ^ j^Mxr 
that solve 

immrmze \\MiG) — M{AB'^)\\f^ (11) 

A, B 

where || • 11^ is the Lrobenius norm. We estimate the minimizers 
with alternating least-squares. Given A, •mi' is a linear least- 
squares problem for B. Similarly, given B, ([n]) is a linear 
least-squares problem for A. We choose a starting value for 
A as the first r eigenvectors, scaled by the square-roots of the 
first r eigenvalues, of Gp in 0. In this sense, the alternating 
least-squares can be considered an iterative refinement on the 
estimates from the first method. Once A and B are estimated, 
we compute the left singular vectors of AB^ to estimate Wi 
in 0. 

The low-rank model AB^ requires the user to choose 
the rank r. To solve the least-squares subproblem for A 
without additional regularization, we need r less than the 
number k of linear measurements (i.e., the number of columns 
in Ei). Additionally, we are guided by the ultimate goal 
of the dimension reduction, which is to construct g in 0. 
Without prior knowledge of the relationship between / and x, 
constructing a response surface is generally too expensive in 
more than a handful of dimensions. Thus, we may reasonably 
keep r less than 8 or 9. If there is no gap within the first 8 or 
9 eigenvalue estimates, then 0 may be inappropriate. 




III. Experiments 


We test the two methods on two functions: (i) a quadratic 
polynomial in m = 10 dimensions and (ii) a quantity of 
interest from the solution of a PDE whose operator coefficients 
depend on m = 100 variables. In each experiment, we compute 
C from Q with a fixed number M of gradient samples, and 
we consider the eigenpairs W and A to be the true values. We 
compute the error in the first six eigenvalue estimates and the 
error in the subspaces defined by the eigenvectors estimates as 


and 


, Ell A? I 


( 12 ) 


|WiWf - WiW^ 


| 2 , 


(13) 


respectively, where Ai and W are the eigenvalue and eigen¬ 
vector estimates, respectively, from the linear measurement- 
based approaches. The subscript 1 on the matrices W and W 
indicates that they contain only the first n columns; we note n 
when needed. We do not expect the eigenvalue estimates from 
the projection-based method in Section |II-A to converge, but 
we report them to compare with the errors from the alternating 
least-squares approach. We repeat the study 20 times with 
independently drawn Gaussian measurement matrices Ei. The 
resulting errors are averaged over these 20 trials. We do not 
study the approximation properties of the finite differences in 
<0, e.g., by varying the finite difference parameter h. Instead, 
we examine cases where the gradient is available, thus focusing 
on the performance of the linear measurement-based methods 
when the answer is known. 


A. Quadratic function 

Let H G be symmetric and positive semidefinite, and 

let /(x) = Ax^ffx, defined on the domain x G [—1,1]^° 
with a uniform density p. The gradient is Vfipf) = ffx. The 
eigenvectors of C from (0 are the eigenvectors of H, and 
the eigenvalues of C are the eigenvalues of H, squared and 
divided by 3. We construct H so that its eigenvalues decay at 
a slow spectral rate, except for a large gap between the third 
and fourth eigenvalues; this indicates that the active subspace 
is three-dimensional. We compute C from Q using M = 200 
gradient samples. We study the quality of the estimates as the 
number k of measurements goes from 4 to 9. Note that fc = 10 
measurements would produce a perfect reconstruction, almost 
surely. Eor the alternating least-squares method, we choose the 
rank r = 4. The caption in Eigure describes the contents of 
the subhgures showing the results. In general, we see the least- 
squares method outperform the projection-based method once 
the number of measurements exceeds the rank r = 4. 


by TO = 100 independent parameters x G via a truncated 
Karhunen-Loeve expansion of a Gaussian random field with 
a relatively long correlation length. This implies that p is a 
standard Gaussian density in 100 dimensions. The quantity of 
interest /(x) is the average of the solution u over the right 
side of the spatial domain. The solution is approximated with 
a well-resolved hnite element method, and the gradient V/(x) 
is computed with an adjoint scheme. More information on this 
problem can be found in Q. 

The matrix C from 0 is estimated with M = 300 gradient 
samples. The particular quantity of interest has a dominant 
one-dimensional active subspace; in other words, there is a 
large gap between the first and second eigenvalues of C. We 
study the approximations as the number k of measurements in¬ 
creases by 20 from 10 to 90. Eor the least-squares method, we 
choose the rank r = 8. The caption in Eigure describes the 
contents of the subhgures showing the results. In general, the 
alternating least-squares method outperforms the projection- 
based method. Note the sharp decrease in the error once the 
number k of measurements exceeds 50. 

IV. Conclusion 

We have studied the approximation properties of two methods 
for estimating the eigenvectors and eigenvalues that dehne 
a multivariate function’s active subspace. These methods use 
linear measurements of the gradient instead of the full gradient, 
which can be efficiently computed with hnite differences. The 
hrst method, based on work by Qi and Hughes pT) , uses 
random projections of the gradient. The second uses a low-rank 
approximation of the matrix of gradient measurements ht with 
an alternating least-squares method. The low-rank approxima¬ 
tion performs better than the projection-based method in two 
numerical tests. Euture work will (i) analyze the improvement 
with the low-rank approximation and (ii) study the effect of 
hnite difference approximations of the linear measurements. 
These initial results are sufficiently promising to pursue such 
studies. 
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Let u = u(s, x) solve the Poisson equation in two spatial 
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with homogeneous Dirichlet boundary conditions on the left, 
top, and bottom of the domain and homogeneous Neumann 
conditions on the right side of the domain. The spatially 
varying operator coefficients a = a(s,x) are parameterized 
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